Unblock Llama2 ONNX export w/ sdpa by falling back to manual impl #28823

BowenBao · 2024-02-01T19:31:42Z

What does this PR do?

Unblocks Llama2 ONNX export with sdpa by falling back to manual implementation.

ValueError: Attention using SDPA can not be traced with torch.jit.trace when no attention_mask is provided. To solve this issue, please either load your model with the argument attn_implementation="eager" or pass an attention_mask input when tracing the model.

Fixes #28610

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@fxmarty

thiagocrepaldi

LGTM. Maybe add a unit test for the torch.jit.trace case?

ArthurZucker

Thanks for this! It's gonna be a bit hard to merge this. Would you mind checking if #27931 fixes the issue? It shall be merged before and should simplify all of that logic

BowenBao · 2024-02-03T01:11:39Z

Hi @ArthurZucker, I have validated the issue is fixed under your PR, thanks! Do you have an ETA when it will get merged? Our workstreams have been blocked by this issue for a while, we need to resolve this export issue asap.

ArthurZucker · 2024-02-05T02:26:50Z

This week 😉 Waiting for @gante's green light and will merge #27931 (it was not clear)

fxmarty · 2024-02-05T08:55:23Z

src/transformers/models/llama/modeling_llama.py

@@ -673,12 +673,22 @@ def forward(
        output_attentions: bool = False,
        use_cache: bool = False,
    ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
+        _jit_tracing = torch.jit.is_tracing()


This means that we call torch.jit.is_tracing as many times as there are layers.

fxmarty · 2024-02-05T08:57:37Z

I don't understand why this change is necessary. The error that is normally raised

ValueError: Attention using SDPA can not be traced with torch.jit.trace when no attention_mask is provided. To solve this issue, please either load your model with the argument attn_implementation="eager" or pass an attention_mask input when tracing the model.

explicitly gives a solution.

thiagocrepaldi · 2024-02-08T15:45:20Z

@ArthurZucker @BowenBao I believe we can close this issue now that #27931 was merged

Unblock Llama2 ONNX export w/ sdpa by falling back to manual impl

5b3c249

thiagocrepaldi approved these changes Feb 1, 2024

View reviewed changes

thiagocrepaldi mentioned this pull request Feb 1, 2024

llama_v2_7b_16h stopped working with torch.jit.trace pytorch/pytorch#117752

Closed

ArthurZucker reviewed Feb 2, 2024

View reviewed changes

fxmarty reviewed Feb 5, 2024

View reviewed changes

BowenBao closed this Feb 8, 2024

BowenBao mentioned this pull request Feb 16, 2024

F.scaled_dot_product_attention support #26572

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unblock Llama2 ONNX export w/ sdpa by falling back to manual impl #28823

Unblock Llama2 ONNX export w/ sdpa by falling back to manual impl #28823

BowenBao commented Feb 1, 2024

thiagocrepaldi left a comment

ArthurZucker left a comment

BowenBao commented Feb 3, 2024

ArthurZucker commented Feb 5, 2024 •

edited

Loading

fxmarty Feb 5, 2024

fxmarty commented Feb 5, 2024

thiagocrepaldi commented Feb 8, 2024

Unblock Llama2 ONNX export w/ sdpa by falling back to manual impl #28823

Unblock Llama2 ONNX export w/ sdpa by falling back to manual impl #28823

Conversation

BowenBao commented Feb 1, 2024

What does this PR do?

Before submitting

Who can review?

thiagocrepaldi left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

BowenBao commented Feb 3, 2024

ArthurZucker commented Feb 5, 2024 • edited Loading

fxmarty Feb 5, 2024

Choose a reason for hiding this comment

fxmarty commented Feb 5, 2024

thiagocrepaldi commented Feb 8, 2024

ArthurZucker commented Feb 5, 2024 •

edited

Loading